Install packages

To create our visualizations we need to install and load packages that we will be using in this project.

> install.packages("jcolors",repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/anast/Documents/R/win-library/3.5'
(as 'lib' is unspecified)
package 'jcolors' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\anast\AppData\Local\Temp\RtmpWA0bYk\downloaded_packages
> install.packages("viridisLite",repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/anast/Documents/R/win-library/3.5'
(as 'lib' is unspecified)
package 'viridisLite' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\anast\AppData\Local\Temp\RtmpWA0bYk\downloaded_packages
> install.packages("viridis",repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/anast/Documents/R/win-library/3.5'
(as 'lib' is unspecified)
package 'viridis' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\anast\AppData\Local\Temp\RtmpWA0bYk\downloaded_packages
> install.packages("ggmap",repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/anast/Documents/R/win-library/3.5'
(as 'lib' is unspecified)
package 'ggmap' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\anast\AppData\Local\Temp\RtmpWA0bYk\downloaded_packages
> library("tidyverse")
-- Attaching packages ---------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  
-- Conflicts ------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
> library("ggmap")
Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
Please cite ggmap if you use it! See citation("ggmap") for details.
> library("jcolors")
> library("viridisLite")
> library("viridis")

Cleaning Data

Before we can visualize the data using ‘ggplot2’, we have to clean it.

We are going to load the data into what the package authors call a ‘data frame tbl’ or ‘tbl_df’, so it is easier to work with. This can be done with tbl_df() function.

> crime <- tbl_df(crime)
> head(crime)
# A tibble: 6 x 26
  Incident.ID Offence.Code CR.Number Dispatch.Date.T~
        <int> <fct>            <int> <fct>           
1   201114095 2501          16066978 12/29/2016 03:5~
2   201117694 1103          17004063 01/23/2017 10:5~
3   201090098 2203          16036935 07/21/2016 01:2~
4   201090098 2901          16036935 07/21/2016 01:2~
5   201089849 3522          16036571 07/19/2016 04:3~
6   201089849 3550          16036571 07/19/2016 04:3~
# ... with 22 more variables: NIBRS.Code <fct>, Victims <int>,
#   Crime.Name1 <fct>, Crime.Name2 <fct>, Crime.Name3 <fct>,
#   Police.District.Name <fct>, Block_Address <fct>,
#   City <fct>, State <fct>, Zip.Code <int>, Agency <fct>,
#   Place <fct>, Sector <fct>, Beat <fct>, PRA <fct>,
#   Address.Number <int>, Start.Date.Time <fct>,
#   End.Date.Time <fct>, Latitude <dbl>, Longitude <dbl>,
#   Police.District.Number <fct>, Location <fct>

We are only interested in some of the variables, and therefore can use select() to only subset the columns we will be looking at.

> crime <- select(crime, Crime.Name1:Crime.Name3, Latitude, Longitude)
> head(crime)
# A tibble: 6 x 5
  Crime.Name1   Crime.Name2    Crime.Name3   Latitude Longitude
  <fct>         <fct>          <fct>            <dbl>     <dbl>
1 Crime Agains~ Counterfeitin~ FORGERY OF C~     39.1     -77.1
2 Crime Agains~ Forcible Rape  RAPE - STRON~     39.1     -77.1
3 Crime Agains~ Burglary/Brea~ BURGLARY - F~     39.2     -77.2
4 Crime Agains~ Destruction/D~ DAMAGE PROPE~     39.2     -77.2
5 Crime Agains~ Drug/Narcotic~ DRUGS - OPIU~     39.2     -77.3
6 Crime Agains~ Drug Equipmen~ DRUGS - NARC~     39.2     -77.3

Next, we will remove all the blanks and select columns containing only certain types of crimes. This can be done with na.omit() and filter() functions, respectively.

> crime_filtered <- na.omit(crime)
> crime_filtered <- filter(crime_filtered, Crime.Name1 == "Other" | Crime.Name1 == "Crime Against Property" | Crime.Name1 == "Crime Against Society" |Crime.Name1 == "Not a Crime" | Crime.Name1 == "Crime Against Person")

Plotting Data in a Bar Graph

If we make a call to ggplot() and choose the data we will be working with, it will create an empty plot. Even though we specified what data we will be working with, we haven’t told ggplot what to do with that data.

> ggplot(crime_filtered)

Now we are going to create a bar graph. In ggplot we can use two functions to create bar graphs. geom_bar() and geom_col(). The two functions are similar, however, bars use count as a y-axis value, while columns allow us to specify y-axis value manually.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1))

Ggplot allows us to change our plot and make it exacly the way we want to. First, we are going to remove both plot and panel backgrounds in our plot.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1)) + 
+   theme(plot.background = element_blank()) +
+   theme(panel.background = element_blank())

This makes plot a little bit harder to read, so we will add grey y-axis grid lines.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1)) + 
+   theme(plot.background = element_blank()) +
+   theme(panel.background = element_blank()) +
+   theme(panel.grid.major.y = element_line(color="grey"))

It is still hard to read the plot, so we will change the names of the scales. x-axis scale is a disctere variable, so we will use scale_x_discrete() function, whereas y-axis is continuous and will require scale_y_continuous(). We then can change the names and add a limit for y-axis.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1)) + 
+   theme(plot.background = element_blank()) +
+   theme(panel.background = element_blank()) +
+   theme(panel.grid.major.y = element_line(color="grey")) +
+   scale_x_discrete(name = "Type of Crime") +
+   scale_y_continuous(name = "Number of Crimes", limits = c(0, 30000))

Now we are going to modify our legend.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1)) + 
+   theme(plot.background = element_blank()) +
+   theme(panel.background = element_blank()) +
+   theme(panel.grid.major.y = element_line(color="grey")) +
+   scale_x_discrete(name = "Type of Crime") +
+   scale_y_continuous(name = "Number of Crimes", limits = c(0, 30000)) +
+  scale_fill_jcolors("pal10", guide = guide_legend(title = "Type of Crime", nrow = 1, label.position = "bottom", keywidth = 2.5)) +
+   theme(legend.position = "bottom")

Finally, we will add the title and subtitle for our plot.

> ggplot(crime_filtered) + geom_bar(mapping=aes(x=Crime.Name1, fill = Crime.Name1)) + 
+   theme(plot.background = element_blank()) +
+   theme(panel.background = element_blank()) +
+   theme(panel.grid.major.y = element_line(color="grey")) +
+   scale_x_discrete(name = "Type of Crime") +
+   scale_y_continuous(name = "Number of Crimes", limits = c(0, 30000)) +
+   scale_fill_jcolors("pal10", guide = guide_legend(title = "Type of Crime", nrow = 1, label.position = "bottom", keywidth = 2.5)) +
+   theme(legend.position = "bottom") +
+   ggtitle("Crime in Montgomery County, MD",
+   subtitle = "Source: catalog.data.gov/dataset/crime")


Creating a Scatterplot

To create a scatterplot, we need to pass Longitude and Latitude as arguments to our x- and y-axis. Unlike with a bar graph, the function for a scatterplot is geom_point().

> ggplot() +
+ geom_point(data = crime_filtered, aes(x = Longitude, y = Latitude))

Every crime is plotted based on its location, however, we have certain areas where crime was more prevalent and it’s hard for us to judge the number of crimes in those areas. We will add transparency to our plot by adding alpha aestetic to our plot.

> ggplot() +
+ geom_point(data = crime_filtered, aes(x = Longitude, y = Latitude), alpha = .04)


Creating a map

Before we begin to work with ggmaps, we need to obtain and register an API key.

> register_google(key="AIzaSyDMpBxrUhyXmi6a7qTFazdZUS59MiFaoBk")

Ggmap has three main functions: geocode() to geocode the coordinates, get_map() retrieves the map, and ggmap() plots it. Let’s geocode Montgomery County, MD.

> montgomerycounty <- geocode("Montgomery County, MD")
Source : https://maps.googleapis.com/maps/api/geocode/json?address=Montgomery+County,+MD&key=xxx
> montgomerycounty
# A tibble: 1 x 2
    lon   lat
  <dbl> <dbl>
1 -77.2  39.2

We can find the coordinates by making a call to get_map() function and specifying the location as well. We can also change the maptype from the default ‘terrain’ to ‘hybrid’ and change the zoom.

> map_montgomerycounty <- get_map("Montgomery County, MD", maptype = "hybrid", zoom = 10)
Source : https://maps.googleapis.com/maps/api/staticmap?center=Montgomery%20County,%20MD&zoom=10&size=640x640&scale=2&maptype=hybrid&language=en-EN&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=Montgomery+County,+MD&key=xxx

Let’s look at the map of Montgomery County, MD.

> ggmap(map_montgomerycounty)

Ggmap is very similar to gglot and requires the same aestetics. We can use the same code as we did to plot our scatterplot.

> ggmap(get_map(montgomerycounty)) + 
+   geom_point(mapping = aes(x=Longitude, y= Latitude), data = crime_filtered)
Source : https://maps.googleapis.com/maps/api/staticmap?center=39.154743,-77.240515&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx

Let’s make a call to ggmap() on ‘map_montgomerycounty’ and change the color to red so is easier to see it on the hybrid map. We can also modify the transparency.

> ggmap(map_montgomerycounty) + 
+   geom_point(data = crime_filtered, aes(x = Longitude, y = Latitude), alpha = .04, color = "red")


Density map

Next, we want to create a heat map, which is basically a two-dimensional density plot.

> ggplot() +
+   stat_density2d(data = crime_filtered, aes(x = Longitude, y = Latitude, fill = ..density..), geom = 'tile', contour = F) 

Since ggplot2 allows us to build the visualizations in layers, we can overlay the density map on top of the geographic map.

> ggmap(map_montgomerycounty) + 
+   stat_density2d(data = crime_filtered, aes(x = Longitude, y = Latitude, fill = ..density..), geom = 'tile', contour = F, alpha = .5)

Finally, let’s add some colors and the title!

> ggmap(map_montgomerycounty) + 
+   stat_density2d(data = crime_filtered, aes(x = Longitude, y = Latitude, fill = ..density..), geom = 'tile', contour = F, alpha = .5) +
+   scale_fill_viridis(option = "inferno") +
+   labs(title = "Crime in Montgomery County, MD",
+   subtitle = "Source: catalog.data.gov/dataset/crime",
+   fill = "Number of Crimes")